Computing DAWGs and Minimal Absent Words in Linear Time for Integer Alphabets
نویسندگان
چکیده
The directed acyclic word graph (DAWG) of a string y is the smallest (partial) DFA which recognizes all suffixes of y and has only O(n) nodes and edges. We present the first O(n)-time algorithm for computing the DAWG of a given string y of length n over an integer alphabet of polynomial size in n. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. As an application to our O(n)-time DAWG construction algorithm, we show that the set MAW (y) of all minimal absent words of y can be computed in optimal O(n+ |MAW (y)|) time and O(n) working space for integer alphabets. 1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems
منابع مشابه
Computing All Distinct Squares in Linear Time for Integer Alphabets
Given a string on an integer alphabet, we present an algorithm that computes the set of all distinct squares belonging to this string in time linear to the string length. As an application, we show how to compute the tree topology of the minimal augmented suffix tree in linear time. Asides from that, we elaborate an algorithm computing the longest previous table in a succinct representation usi...
متن کاملSmooth words on 2-letter alphabets having same parity
In this paper, we consider smooth words over 2-letter alphabets {a, b}, where a, b are integers having same parity, with 0 < a < b. We show that all are recurrent and that the closure of the set of factors under reversal holds for odd alphabets only. We provide a linear time algorithm computing the extremal words, w.r.t. lexicographic order. The minimal word is an infinite Lyndon word if and on...
متن کاملMinimal absent words in a sliding window & applications to on-line pattern matching
An absent (or forbidden) word of a word y is a word that does not occur in y. It is then called minimal if all its proper factors occur in y. There exist linear-time and linear-space algorithms for computing all minimal absent words of y (Crochemore et al., 1998, Belazzougui et al., 2013, Barton et al., 2014). Minimal absent words are used for data compression (Crochemore et al., 2000, Ota and ...
متن کاملSpace Efficient Linear Time Lempel-Ziv Factorization on Constant~Size~Alphabets
We present a new algorithm for computing the Lempel-Ziv Factorization (LZ77) of a given string of length N in linear time, that utilizes only N logN+O(1) bits of working space, i.e., a single integer array, for constant size integer alphabets. This greatly improves the previous best space requirement for linear time LZ77 factorization (Kärkkäinen et al. CPM 2013), which requires two integer arr...
متن کاملSuux Trees for Integer Alphabets Revisited
Farach recently gave a linear-time algorithm for constructing suux trees for integer alphabets, which solves a major open problem on index data structures. We present a new and somewhat cleaner algorithm for constructing suux trees for integer alphabets in linear time.
متن کامل